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Abstract 

Various symmetries connected with purine-pyrimidine content of 
DNA sequences are studied in terms of the intruduced determinative 
degree, a new characteristics of nucleotide which is connected with 
codon usage. A numerological explanation of CG pressure is pro- 
posed. A classification of DNA sequences is given. Calculations with 
real sequences show that purine-pyrimidine symmetry increases with 
growing of organization. A new small parameter which characterizes 
the purine-pyrimidine symmetry breaking is proposed for the DNA 
theory. 
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Abstract investigation of the genetic code is a powerful tool in DNA mod- 
els construction and understanding of genes organization and expression . 
In this direction the study of symmetries |0, |^ , application of group theory |^ 
and implication of supersymmetry are the most promising and necessary 
for further elaboration. In this paper we consider symmetries connected with 
purine-pyrimidine content of DNA sequences in terms of the determinative 
degree introduced in [§. 

We denote a triplet of nucleotides by xyz, where x,y,z = C,T,A, G. 
Then redundancy means that an amino acid is fully determined by first two 
nucleotides x and y independently of third z Sixteen possible doublets xy 
group in 2 octets by ability of amino acid determination 0. Eight doublets 
have more "strength" in sense of the fact that they simply encode amino 
acid independently of third bases, other eight ("weak") doublets for which 
third bases determines content of codons. In general, transition from the 
"powerful" octet to the "weak" octet can be obtained by the exchange 
C A, G <^=^ T, which we name "star operation (*)" and call purine- 
pyrimidine inversion. Thus, if in addition we take into account GC pressure 
in evolution and third place preferences during codon-anticodon pairing 
0, then 4 nucleotides can be arranged in descending order in the following 
way: 

Pyrimidine Purine Pyrimidine Purine 

C G T A (1) 

very "strong" "strong" "weak" very "weak" 

Now we introduce a numerical characteristics of the empirical "strength" 
— determinative degree of nucleotide x and make transition from qualita- 
tive to quantitative description of genetic code structure [||. It is seen from 

that the determinative degree of nucleotide can take value da; = 1, 2, 3, 4 
in correspondence of increasing "strength". If we denote determinative de- 
gree as upper index for nucleotide, then four bases (|l]) can be presented 
as vector-row V = ( C-^-* G*^^^ T*^^^ A*^-*-^ ). Then the exterior product 
M = V X V represents the doublet matrix M and corresponding rhombic 



code [|T0[, and the triple exterior product IK = V x V x V corresponds to 
the cubic matrix model of the genetic code which were described in terms 
of the determinative degree in To calculate the determinative degree of 
doublets xy we use the following additivity assumption 

d^y = d^ + dy, (2) 
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which holds for triplets and for any nucleotide sequence. Then each of 64 
elements (codons) of the cubic matrix K will have a novel number character- 
istics — determinative degree of codon d^yz = dcodon = dx + + which 
takes value in the range 3 -r- 12. We can also define the determinative degree 
of amino acid dAA as mean arithmetic value dAA = Yl ^codon/ndeg, where Udeg 
is its degeneracy (redundancy). That can allow us to analyze new abstract 
amino acid properties in connection with known biological properties 0. 

Let us consider a numerical description of an idealized DNA sequence 
as a double-helix of two codon strands connected by complementary condi- 
tions Each strand is described by four numbers (r^c, ^T'G, ?^t, ''^a) and 
{mc,mG,mrj^,mA.), where Ux is a number of nucleotide x in one strand. In 
terms of n^. and the complementary conditions are 

nc = fnc, rric = no, Ut = rriA, mx = Ua- (3) 

The Chargaff's rules [|T[ for a double-helix DNA sequence sound as: 1) 
total quantity of purines and pyrimidines are equal A^a + A'g = A'c + A^t; 
2) total quantity of adenine and cystosine equal to total quantity of guanine 
and thymine A'a + A'c = A^t + A'g; 3) total quantity of adenine equal to 
total quantity of thymine A^a = A^t and total quantity of cystosine equal to 
total quantity of guanine A'c = A'g; 4) the ratio of guanine and cystosine to 
adenine and thymine v = (Aa + A^t) / (A^c + A^g) is approximately constant 
for each species. Usually the Chargaff's rules are defined through macroscopic 
molar parts which are proportional to absolute number of nucleotides in DNA 
If we consider a DNA double- helix sequence, then = + rux- In 
terms of and rUx the first three Chargaff's rules lead to the equations which 
are obvious identities, if complimentary (|^) holds. From fourth Chargaff's 
rule it follows that the specificity coefficient Vnm for two given strands is 

_ nA + rriA + + tht , . 

ric + mc + + niG 
The complementary (0) leads to the equality of coefficients v of each 
strand Vnm = Vn = Vm = v, and v is connected with GC content pcG in the 
double-helix DNA as pcG = V (1 + "y)- 

We consider another important coefficient: the ratio of purines and pyrim- 
idines k. For two strands from the first Chargaff's rule we obviously derive 
knm = 1- But for each strand we have 

_ nc + riA _ rriG + mA , . 

I^n — ; ; rCm — j 
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which satisfy the equation fc^fc^ = 1 following from complementary. 

Let us introduce the determinative degree of each strand exploiting the 
additivity assumption as 

d„ = 4 ■ nc + S ■ na + 2 ■ Ut + 1 ■ n^, (6) 
dm = 4 • nic + 3 ■ niG + 2 • mx + 1 ■ itla- (7) 

The values d„ and d^ can be viewed as characteristics of the empirical 
"strength" for strands, i.e. "strand generalization" of (|lD. Then we define 
summing and difference "strength" of a double-helix sequence by 

d+ = d„ + dm, d_ = d„ - dm- (8) 

The first variable d+ can be treated as the summing empirical "strength" 
of DNA (or its fragment). Taking into account the complementary conditions 
(H) we obtain d+ through one strand variables 

d+ = 7 • (nc + nc) + 3 • (riT + Ua) ■ (9) 

We can also present d+ through macroscopically determined variables 

as follows d+ = 7 • iVc + 3 • A^A = 7 • A/'g + 3 • A^T, or through GC and AT 

7 3 
contents as d+ = - ■ A^c+g + ■ A^a+t- 

To give sense to the difference d_ we derive 

d_ = ric + - ^G - ^A- (10) 

We see that the star operation obviously acts as (d_|_)* = d_|_ and (d_)* = 
— d_. From (p|)-(p!0|) it follows the main statement: 

The biological sense of the determinative degree d is contained 
in the following purine-pyrimidine relations: 

1) The sum of the determinative degrees of matrix and com- 
plementary strands in DNA (or its fragment) equals to 

d+ = ^ • A^c+G + I ■ A^A+T. (11) 

2) The difference of the determinative degrees between ma- 
trix and complementary strands in DNA (or its fragment) ex- 
actly equals to the difference between pyrimidines and purines 
in one strand 
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Upyrimidines '^purines ) 



(12) 



where Upyrimidines = tiq + Ut and ripurines = no + Ua, or it is 
equal to difference of purines or pyrimidines between strands 



Upyrimidines Ulpyrimidi' 



Ulpurines Upn^ines- 



We can also find connection between d+, i 
as follows 



(13) 

and the coefficients k and v 



d+ = ^iVc+G(r + 3t;) = 



C+G 



2 + 



2 -PCG 



(14) 
(15) 



Upyrimidines 

If we consider one species for which v = const (or pcc = const), then 
we observe that d+ ~ Nc+g, which can allow us to connect the determi- 
native degree with "second level" of genetic information [Q. From another 
7 

side, the ratio - of coefficients in ([TTD can play a numerological role in CG 

pressure explanations and therefore d+ can be considered as some kind 
of "evolutionary strength" . 

Now we consider the determinative degree of double-helix sequences in 
various extreme cases and classify them. We call a DNA sequence mononu- 
cleotide, dinucleotide, trinucleotide or full, if one, two, three or four numbers 
TT-a, respectively distinct from zero. Properties of mononucleotide double-helix 
DNA sequence are in the Table 1. 



Table \ 


Mononucleotide DNA 


nx 


d+ 




amino acid 


nc^O 


7nc 


nc 


Pro 




7nG 


-Ug 


Gly 


nx ^ 


Sut 




Phe 


Uat^O 


Sua 


-Ua 


Lis 



The mononucleotide sequences which encode most extended amino acids 
Gly and Lis have negative d_, and the mononucleotide sequences which en- 
code amino acids Pro and Phe with similar chemical type of radicals have 
positive d_. 

The dinucleotide double-helix DNA sequences (without mononucleotide 
parts) are described in the Table 2. 
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Table 2. Dinucleotide Dfv 


A 


nx 


d+ 


d 


amino acid 


ric ^ 0,nG 


7{nc + rid) 


- riG 


Pro,Arg,Ala,Gly 


nc 7^ 0, nx 7^ 


7nc + 3nx 


nc + ^^T 


Pro,Phe,Leu,Ser 


nc ^ 0,nA ^ 


7nc + 3nA 


- ^A 


Pro,Gly,Asn,Tur,His 


riG 7^ 0, nx 7^ 


7nG + 3nx 


riT - 


Gly.Leu.Val.Cys.Trp 


riG 0,nA ^ 


7?2g + 3?2a 


-nc - nA 


Gly,Glu,Arg,Lys 


nx 7^ 0, nA 7^ 


3 (nx + ua) 


nx - ^A 


Leu,Asn,Tur,TERM 



The trinucleotide DNA can be listed in the similar, but more cumbersome 
way. The full DNA sequences consist of nucleotides of all four types and 
described by 

The introduction of the determinative degree allows us to single out a 
kind of double-helix DNA sequences which have an additional symmetry. 
We call a double-helix sequence purine-pyrimidine symmetric, if 



d_ = 0, 

i.e. its empiric "strength" vanishes. From ( pUD it follows 

nc + UT = nG + Ha, 
i.e. kn = km = ^, which can be rewritten for one strand 

'^pyrimidines '^purines 

or as equality of purines and pyrimidines in two strands 



'^pyrimidines 
"^purines 



^pyrimidines ; 
^purines ■ 



(16) 

(17) 

(18) 

(19) 
(20) 



The purine-pyrimidine symmetry (|1^) has two particular cases: 

symmetric DNA, (21) 

(22) 



nx 



2) 



nx 



— antisymmetric DNA. 



The first case corresponds to the Chargaff 's rule applied to a single strand 
which approximately holds for long sequences |ll|], and so it would be inter- 
esting to compare transcription and expression properties of symmetric and 
antisymmetric double-helix sequences. 
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We have made a preliminary analysis of real sequences of several species 
taken from GenBank (2000) in terms of the determinative degree. It were 
considered 10 complete sequences of E.coli (several genes and full genomic 
DNA 9-12 min.), 12 complete sequences of Drosophila melanogaster (crc 
genes), 10 complete sequences of Homo sapiens Chromosome 22 (various 
clones), 10 complete sequences of Homo sapiens Chromosome 3 (various 
clones). We calculated the nucleotide content A^c-^Tj^g^^a and the de- 
terminative degree characteristics d+,d_,g = d_/d+,A;„ and v for every 
sequence. Then we averaged their values for each species. The result is 
presented in the Table 3. 



Table 3. Mean determinative degree characteristics of real sequences 



sequence 


II, 


^Ed- 

II, 


II, 


II, 


^E. 

II, 


E.coli 


90806 


-138 


-6.8 


1.07 


1.38 


Drosophila 


7325 


-70 


-8.9 


1.09 


1.31 


Homo sap. Chr.22 


337974 


6865 


1.46 


0.987 


1.14 


Homo sap. Chr.3 


806435 


-1794 


-2.29 


1.021 


1.55 



First of all we observe that all real sequences have high purine-pyrimidine 
symmetry (smallness of parameter q). Also we see that the relation of purines 
and pyrimidines in one DNA strand kn is very close to unity, therefore we 
have a new small parameter in the DNA theory [kn — 1) (or q), which charac- 
terizes the purine-pyrimidine symmetry breaking. This can open possibility 
for various approximate and perturbative methods application. Second, we 
notice from Table 3 that the purine-pyrimidine symmetry increases in direc- 
tion from protozoa to mammalia and is maximal for human chromosome. It 
would be worthwhile to provide a thorough study of purine-pyrimidine sym- 
metry and codon usage in terms of the introduced determinative degree by 
statistical methods, which will be done elsewhere. 
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